Improved motif-scaffolding with SE(3) flow matching

Protein design often begins with the knowledge of a desired function from a motif which motif-scaffolding aims to construct a functional protein around. Recently, generative models have achieved breakthrough success in designing scaffolds for a range of motifs. However, generated scaffolds tend to lack structural diversity, which can hinder success in wet-lab validation. In this work, we extend FrameFlow, an SE(3) flow matching model for protein backbone generation, to perform motif-scaffolding with two complementary approaches. The first is motif amortization, in which FrameFlow is trained with the motif as input using a data augmentation strategy. The second is motif guidance, which performs scaffolding using an estimate of the conditional score from FrameFlow without additional training. On a benchmark of 24 biologically meaningful motifs, we show our method achieves 2.5 times more designable and unique motif-scaffolds compared to state-of-the-art. Code: https://github.com/microsoft/protein-frame-flow


Introduction
A common task in protein design is to create proteins with functional properties conferred through a prespecified arrangement of residues known as a motif.The problem is to design the remainder of the protein, called the scaffold, that harbors the motif.Motif-scaffolding is widely used, with applications to vaccine and enzyme design (Procko et al., 2014;Correia et al., 2014;Jiang et al., 2008;Siegel et al., 2010).For this problem, diffusion models have greatly advanced capabilities in designing new scaffolds (Wu et al., 2023;Trippe et al., 2022;Ingraham et al., 2023).While experimental wet-lab validation is the ultimate test for evaluating a scaffold, in this work we focus on improving performance under computational validation of scaffolds following prior works.In-silico success is defined as satisfying the designability1 criteria which has been found to correlate well with wet-lab success (Wang et al., 2021).The current state-of-the-art, RFdiffusion (Watson et al., 2023), fine-tunes a pre-trained RosettaFold (Baek et al., 2023) neural network with SE(3) diffusion (Yim et al., 2023b) and is able to successfully scaffold the majority of motifs in a recent benchmark. 2However, RFdiffusion suffers from low scaffold diversity which can hinder chances of a successful design.Moreover, the large model size and pre-training used in RFdiffusion makes it slow to train and difficult to deploy on smaller machines.In this work, we present a lightweight and easy-to-train model with improved performance.
Our method adapts an existing SE(3) flow matching model, FrameFlow (Yim et al., 2023a), for motifscaffolding.We develop two approaches: (i) motif amortization, and (ii) motif guidance as illustrated in Fig. 1.Motif amortization simply trains a conditional model with the motif as additional input when generating the scaffold.We use data augmentation to amortize over all possible motifs in our training set and aid in generalization to new motifs.Motif guidance relies on a Bayesian approach, using an unconditional FrameFlow model to sample the scaffold residues, while the motif residues are guided at each step to their final desired positions.An unconditional model in this context is one that generates the full protein backbone without distinguishing between the motif and scaffold.Motif guidance was described in Wu et al. (2023) for SE(3) diffusion.In this work, we develop the extension to SE(3) flow matching.
The two approaches differ in whether to use an conditional model or to re-purpose an unconditional model for conditional generation.Motif guidance has the advantage that any unconditional model can be used to readily perform motif scaffolding without the need for additional task-specific training.To provide a controlled comparison, we train unconditional and conditional versions of FrameFlow on a dataset of monomers from the Protein Data Bank (PDB) (Berman et al., 2000).Our results provide a clear comparison of the modeling choices made when performing motif-scaffolding with FrameFlow.We find that FrameFlow with Figure 1: We present two strategies for motif-scaffolding.Top: motif amortization trains a flow model to condition on the motif (blue) and generate the scaffold (red).During training, only the scaffold is corrupted with noise.Bottom: motif guidance re-purposes a flow model that is trained to generate the full protein for motif-scaffolding.During generation, the motif residues are guided to reconstruct the true motif at t = 1 while the flow model will adjust the scaffold trajectory to be consistent with the motif.
both motif amortization and guidance surpasses the performance of RFdiffusion, as measured by the number of structurally unique scaffolds3 that pass the designability criterion.
This work is structured as follows.Sec. 2 provides background on SE(3) flow matching.We present our main contribution extending FrameFlow for motif-scaffolding in Sec. 3. We develop motif amortization for flow matching while motif guidance, originally developed for diffusion models, follows after drawing connections between flow matching and diffusion models.Next we discuss related works Sec. 4 and present empirical results Sec. 5. Our contributions are the following: • We extend FrameFlow with two fundamentally different approaches for motif-scaffolding: motif amortization and motif guidance.We are the first to extend conditional generation techniques with SE(3) flow matching and apply them to motif-scaffolding.With all other settings kept constant, we perform a empirical study of how each approach performs.
• On a benchmark of biologically meaningful motifs, we show our method can successfully scaffold 20 out of 24 motifs in the motif-scaffolding benchmark which is equivalent to previous state-of-the-art, while achieving 2.5 times more unique, designable scaffolds.Our results demonstrate the importance of measuring diversity to detect mode collapse.

Flow matching on Riemannian manifolds
On a manifold M, a CNF ϕ t (•) : M → M is defined via an ODE along a time-dependent vector field v(z, t) : M × R → T z M where T z M is the tangent space of the manifold at z ∈ M and time is t ∈ [0, 1]: Starting with z 0 ∼ p 0 from an easy-to-sample prior distribution p 0 , simulating samples according to Eq. ( 1) induces a new distribution referred as the push-forward p t = [ϕ t ] * p 0 .One wishes to find a vector field v such that the push-forward p t=1 = [ϕ t=1 ] * p 0 (at t = 1) matches the data distribution p 1 .Such a vector field v is in general not available in closed-form, but can be learned by regressing conditional vector fields u(z t , t|z 1 ) = d dt z t where z t = ϕ t (z 0 |z 1 ) interpolates between endpoints z 0 ∼ p 0 and z 1 ∼ p 1 .A natural choice for z t is the geodesic path: z t = exp z0 t log z0 (z 1 ) , where exp z0 and log z0 are the exponential and logarithmic maps at the point z 0 .The conditional vector field takes the following form: u(z t , t|z 1 ) = log zt (z 1 )/(1 − t).
The key insight of conditional4 flow matching (CFM) (Lipman et al., 2023) is that training a neural network v to regress the conditional vector field u is equivalent to learning the unconditional vector field v.This corresponds to minimizing where U(t; [0, 1]) is the uniform distribution for t ∈ [0, 1] and ∥•∥ 2 g is the norm induced by the Riemannian metric g : T M × T M → R. Samples can then be generated by integrating the ODE in Eq. ( 1) with Euler steps using the learned vector field v in place of v.

Generative modeling on protein backbones
The atom positions of each residue in a protein backbone can be parameterized by an element T ∈ SE(3) of the special Euclidean group SE(3) (Jumper et al., 2021;Yim et al., 2023b).We refer to T = (r, x) as a (local) frame consisting of a rotation r ∈ SO(3) and translation vector x ∈ R 3 .The protein backbone is made of N residues, meaning it can be parameterized by N frames denoted as T = [T (1) , . . ., T (N ) ] ∈ SE(3) N .We use bold face to refer to vectors of all the residues, superscripts to refer to residue indices, and subscripts refer to time.Details of the SE(3) N backbone parameterization can be found in App.B.1.
We use SE(3) flow matching to parameterize a generative model over the SE(3) N representation of protein backbones.The application of Riemannian flow matching to SE(3) was previously developed in Yim et al. (2023a);Bose et al. (2023).Endowing SE(3) with the product left-invariant metric, the SE(3) manifold effectively behaves as the product manifold SE(3) = SO(3) × R 3 (App.D.3 of Yim et al. (2023b)).The vector field over SE(3) can then be decomposed as v Our goal is train a neural network to parameterize the learned vector fields, The outputs of the neural network are denoised predictions x(n) 1 and r(n) which are used to calculate the vector fields in Eq. (3).The loss becomes where the expectation is taken over U(t; [0, 1]), p 1 (T 1 ), p 0 (T 0 ).We have used bold-face for collections of elements, i.e. v(•) = [v (1) (•), . . ., v(N) (•)].Our prior is chosen as p 0 (T 0 ) = U(SO(3)) N ⊗ N (0, I 3 ) N , where U(SO(3)) is the uniform distribution over SO(3) and N (0, I 3 ) is the isotropic Gaussian where samples are centered to the origin.Details of SE(3) flow matching such as architecture and hyperparameters closely follow FrameFlow (Yim et al., 2023a), details of which are provided in App.B.2.

Motif-scaffolding with FrameFlow
We describe our two strategies for performing motif-scaffolding with the FrameFlow model: motif amortization (Sec.3.1) and motif guidance (Sec.3.2).Recall the full protein backbone is given by T = Figure 2: Motif data augmentation.Each protein in the dataset does not come with pre-defined motif-scaffold annotations.Instead, we construct plausible motifs at random to simulate sampling from the distribution of motifs and scaffolds.
{T (1) , T (2) , . . ., T (N ) } ∈ SE(3) N .The residues can be separated into the motif T M = {T (i1) , . . ., T (i k ) } of length k where {i 1 , . . ., i k } ⊂ {1, . . ., N } are motif residue indices, and the scaffold T S is all the remaining residues, such that T = T M ∪ T S .The task can then be framed as the problem of sampling from the conditional distribution p(T S |T M ).

Motif amortization
We train a variant of FrameFlow that additionally takes the motif as input when generating scaffolds (and keeping the motif fixed).Formally, we model a motif-conditioned CNF via the following ODE, The flow ϕ t transforms a prior density over scaffolds along time, inducing a density We use the same prior as in Sec.2.2: FrameFlow is trained to predict the conditional vector field u(T S t , t|T S 1 , T M ) where T S t is defined by interpolating along the geodesic path, ) .The implication is that u is conditionally independent of the motif T M given T S 1 .This simplifies our formulation to u(T S t , t|T S 1 , T M ) = u(T S t , t|T S 1 ) that is defined in Sec.2.2.However, when we learn the vector field, the model needs to condition on T M since the motif placement T M contains information on the true scaffold positions T S 1 .The training loss becomes, where the expectation is taken over . The above expectation requires access to the motif and scaffold distributions, p(T M ) and p 1 (T S 1 |T M ), during training.Future work can look into incorporating known motif-scaffolds such as the CDR loops on antibodies (Dunbar et al., 2014).While some labels exist for which residues correspond to the functional motif, the vast majority of protein structures in the PDB do not have labels.We instead utilize unlabeled PDB structures to perform data augmentation (see Sec. 3.1.1)that allows sampling a wide range of motifs and scaffolds.
To learn the motif-conditioned vector field vt , we use the FrameFlow architecture with a 1D mask as additional input with a 1 at the location of the motif and 0 elsewhere.To maintain SE(3)-equivariance, we zero-center the motif and initial noise sample from p 0 (T S 0 |T M ).Zero-centering the motif also prevents the model from using the motif offset from the origin to memorize scaffold locations which helps generalization.

Data augmentation.
The flow matching loss from Eq. ( 7) involves sampling from p(T M ) and p 1 (T S 1 |T M ), which we do not have access to, but can be approximated using unlabeled structures from the PDB.Our pseudo-labeled motifs and scaffolds are generated as follows (also depicted in Fig. 2).First, a protein structure is sampled from the PDB dataset.Second, a random number of residues are selected to be the starting locations of each motif.Third, additional residues are appended onto each motif thereby extending their lengths.The length of each motif is randomly sampled such that the total number of motif residues is between γ min and γ max percent of all the residues.We use γ min = 0.05 and γ max = 0.5 to ensure at least a few residues are used as the motif but not more than half the protein.Finally, the remaining residues are treated as the scaffold and corrupted.The motif and scaffold are treated as samples from p(T M ) and p 1 (T S 1 |T M ) respectively.Importantly, each protein will be re-used on subsequent epochs where new motifs and scaffolds will be sampled.Our pseudo motif-scaffolds cover a wide range of scenarios that cover multiple motifs of different lengths.
The lack of functional annotations in the PDB requires training over all possible motif-scaffold annotations to handle new scenarios our method may encounter in real world scenarios.In our experiments, we evaluate how this data augmentation strategy transfers to real motif-scaffolding tasks.A similar strategy is used in image infilling where image based diffusion models are trained to infill randomly masked crops of images to approximate real image infilling scenarios (Saharia et al., 2022).Motif-scaffolding data augmentation was mentioned in RFdiffusion but without algorithmic detail.Since RFdiffusion does not release training code, we implemented our own data augmentation algorithm in Algorithm 1.

Motif guidance
We now present an alternative Bayesian approach to motif-scaffolding that does not involve learning a motifconditioned flow model.As such it does not require having access to motifs at training time, but only at sampling time.This can be useful when an unconditional generative flow model is already available at hand and additional training is too costly.The idea behind motif guidance, first described as a special case of TDS (Wu et al., 2023) using diffusion models, is to use the desired motif T M to bias the model's generative trajectory such that the motif residues end up in their known positions.The scaffold residues follow a trajectory that create a consistent whole protein backbone, thus achieving motif-scaffolding.
The key insight comes from connecting flow matching to diffusion models to which motif guidance can be applied.The following ODE describes the relationship between the vector field v in flow models -learned by minimizing CFM objective in Eq. ( 5) -and the Stein score ∇ log p t (T t ), The gradient is taken with respect to the backbone at time t which we omit for brevity, i.e. ∇ = ∇ Tt .Eq. ( 8) shows the ODE used to sample from flow models can be written as the probability flow ODE used in diffusion models (Song et al., 2020) with f and g as the drift and diffusion coefficients.The derivation of Eq. ( 8) requires standard linear algebra and calculus for our choice of vector field (see App. D).
Our goal is to sample from the conditional p(T|T M ) from which we can extract p(T S |T M ).The benefit of Eq. ( 8) is we can manipulate the score term to achieve this goal.We modify the above to be conditioned on the motif T M followed by an application of Bayes rule where We can interpret Eq. ( 9) as doing unconditional generation by following vSE(3) (T, t) while ∇ log p t (T M |T t ) guides the noised residues so as to be consistent with the true motif.Doob's H-transform ensures Eq. ( 9) will sample from p(T|T M ) (Didi et al., 2023).The conditional score ∇ log p t (T M |T t ) is unknown, yet it can be approximated by marginalising out T 1 and using the neural network's denoised output (Song et al., 2022;Chung et al., 2022;Wu et al., 2023), We now have the choice to define the likelihood in Eq. ( 12) to have higher probability the closer it is to the desired motif: which is inversely proportional to the distance from the desired motif.Following SE(3) flow matching, Eq. ( 9) becomes factorized into the translation and rotation components.Plugging p(T M | TM 1 (T t )) in Eq. ( 9), we arrive at the following ODE we may sample p(T|T M ) from Rotations: ω t is a hyperparameter that controls the magnitude of the guidance towards the desired motif which we set to 2 ) as done in Pokle et al. (2023); Song et al. (2021).While different choices of g(t) are possible, Pokle et al. (2023) proposed to use g(t) = (1 − t)/t with the motivation that this matches the diffusion coefficient for the diffusion SDE that matches the marginals of the flow ODE.For completeness, we provide the proof for g(t) in App.D. A similar calculation is non-trivial for SO(3), hence we use the same g(t) as a reasonnable heuristic and observe good performance as done in (Wu et al., 2023).

Related work
Conditional diffusion and flows.The development of conditional generation methods for diffusion and flow models is an active area of research.Two popular diffusion techniques that have been extended to flow matching are classifier-free guidance (CFG) (Dao et al., 2023;Ho & Salimans, 2022;Zheng et al., 2023) and reconstruction guidance (Pokle et al., 2023;Ho et al., 2022;Song et al., 2022;Chung et al., 2022).Motif guidance is an application of reconstruction guidance for motif-scaffolding.Motif amortization is most related to data-dependent couplings (Albergo et al., 2023), where a flow is learned with conditioning of partial data.
Motif-scaffolding.Wang et al. (2021) first formulated motif-scaffolding using deep learning.SMCDiff (Trippe et al., 2022) was the first proposed diffusion model for motif-scaffolding using Sequential Monte Carlo (SMC).Twisted Diffusion Sampler (TDS) (Wu et al., 2023) later improved upon SMCDiff using reconstruction guidance for each particle in SMC.Our motif guidance method follows from TDS (with one particle) by deriving the equivalent guidance vector field from its conditional score counterpart.RFdiffusion (Watson et al., 2023) fine-tunes a pre-trained neural network with motif-conditioned diffusion training.Our FrameFlow-amortization approach in principle follows RFdiffusion's diffusion training, but differs in (i) using flow matching, (ii) not relying expensive pre-training, and (iii) uses a 3× smaller neural network5 .Didi et al. (2023) provides a survey of structure-based motif-scaffolding methods while proposing Doob's h-transform for motifs-scaffolding.EvoDiff (Alamdari et al., 2023) differs in using a sequence-based diffusion model that performs motif-scaffolding with language model-style masked generation but performance falls short of RFdiffusion and TDS.

Experiments
In this section, we report the results of training FrameFlow for motif-scaffolding.Sec.5.1 describes training, sampling, and metrics.Our main results on motif-scafolding are reported in Sec.5.2 on the benchmark introduced in RFdiffusion.Additional motif-scaffolding analysis is provided in App.G.

Set-up
Training.We train two FrameFlow models.FrameFlow-amortization is trained with motif amortization as described in Sec.3.1 with data augmentation using hyperparameters: γ min = 0.05 so the motif is never Figure 3: Motif-scaffolding results.Top plot: RFdiffusion achieves the most designable scaffolds amongst all methods in 9/24 test motifs compared to FrameFlow-amortization's 7/24 and TDS' 6/24; 2/24 are ties.Bottom plot: However, we observe that RFdiffusion produces the highest number of unique designable scaffolds for only 2 out of the 24 test motifs.Therefore, previous approaches that only measure designability (top plot) may be misleading since those generative models that may have the best designability can also be repeatedly sampling similar scaffolds.This demonstrates the need to measure diversity alongside designability and use the number of unique designable scaffolds as the metric of success.
degenerately small and γ max = 0.5 to avoid motif being the majority of the backbone.FrameFlow-guidance, to be used in motif guidance, is trained unconditionally on full backbones.Since unconditional generation is not our focus, we leave the unconditional performance to App.F where we see the performance is slightly worse than RFdiffusion -as we will see, the motif-scaffolding performance is better.Both models are trained using the filtered PDB monomer dataset introduced in FrameDiff.We use the ADAM optimizer (Kingma & Ba, 2014) with learning rate 0.0001.We train each model for 6 days on 2 A6000 NVIDIA GPUs with dynamic batch sizes depending on the length of the proteins in each batch -a technique from FrameDiff.
Sampling.We use the Euler-Maruyama integrator with 500 timesteps for all sampling.Following the motif-scaffolding benchmark proposed in RFdiffusion, we sample 100 scaffolds for each of the 24 monomer motifs6 .For each motif, the method must sample novel scaffolds with different lengths and different motif locations along the sequence.The benchmark measures how well a method can generalize beyond the native scaffolds for a set of biologically important motifs.
Hyperparameters.Our hyperparameters for neural network architecture, optimizer, and sampling steps all follow the best settings found in FrameFlow (Yim et al., 2023a).We leave hyperparameter search as a future work since it is not the focus of this work.

Motif-scaffolding results
Baselines.We consider RFdiffusion and the Twisted Diffusion Sampler (TDS) as baselines.RFdiffusion's performance is reported based on their published samples.TDS reported motif-scaffolding results with arbitrary scaffold lengths that deviated the benchmark.Therefore, we re-ran TDS with their best settings using k = 8 particles on the RFdiffusion benchmark.We refer to FrameFlow-amortization as our results with motif amortization while FrameFlow-guidance uses motif guidance.
Metrics.Previously, motif-scaffolding was only evaluated through samples passing designability (Des.).For a description of designability see App.E. Within the set of designable scaffolds, we also calculate the Figure 4: FrameFlow-amortization diversity.In blue is the motif while red is the scaffold.For each motif (1QJG, 1YCR, 5TPN), we show FrameFlow-amortization can generate scaffolds of different lengths and various secondary structure elements for the same motif.Each scaffold is in a unique cluster to showcase the samples'structural diversity.diversity (Div.) as the number of structurally unique clusters.This is crucial since designability can be manipulated to have a 100% success rate by always sampling the same scaffold with trivial changes.In real world scenarios, diversity is desired to gain the most informative feedback from expensive wet-lab experiments (Yang et al., 2019).Thus diversity provides an additional data point to check for mode collapse where the model is sampling the same scaffold repeatedly.Clusters are computed using MaxCluster (Herbert & Sternberg, 2008) with TM-score threshold set to 0.5.
Benchmark.Fig. 3 shows how each method fares against each other in designability and diversity on each motif of the motif-scaffolding benchmark.While it appears RFdiffusion gets lots of successful scaffolds, the number of unique scaffolds is far lower than both our FrameFlow approaches.TDS achieves lower designable scaffolds on average, but demonstrates strong performance on a small subset of motifs.There are some motifs that only RFdiffusion can solve (7MRX_85, 7MRX_128) while FrameFlow is able to solve cases RFdiffusion cannot (1QJG, 4JHW, 5YUI).Tab. 1 provides the number of motifs each method solves -which means at least one designable scaffold is sampled -and the number of total designable clusters sampled across all motifs.Here we see each method can solve 19-20 solves motifs, but FrameFlow-amortization can achieve nearly double the number of unique scaffolds (clusters) as RFdiffusion.FrameFlow-amortization outperforms FrameFlow-guidance on diversity.
A potential reason for the improved diversity is the use of SE(3) flow matching in the unconditional model whereas TDS uses SE(3) diffusion (Yim et al., 2023b).Bose et al. (2023) found SE(3) flow matching to provide far better designability and diversity than its diffusion counterpart.Empirically, it is known flow matching outperforms diffusion on Riemannian manifolds (Chen & Lipman, 2023).
In the last column we give the number of seconds to sample a length 100 protein on a A6000 Nvidia GPU with each method.Both FrameFlow methods are significantly faster than RFdiffusion and TDS.TDS is notably slower since its run time scales with its number of particles.We conclude that FrameFlow-amortization matches RFdiffusion and TDS on the number of solved motifs while achieving much higher diversity and faster inference.Diversity analysis.To visualize the diversity of the scaffolds, Fig. 4 shows several of the clusters for motifs 1QJG, 1YCR, and 5TPN where FrameFlow can generate significantly more clusters than RFdiffusion.Each scaffold demonstrates a wide range of secondary structure elements across multiple lengths.To quantify this in more depth, Fig. 5 plot the helical and strand compositions (computed with DSSP (Kabsch & Sander, 1983)) of designable motif-scaffolds from FrameFlow-amortization compared to RFdiffusion.We see FrameFlow-amotization achieves a better spread of secondary structure components than RFdiffusion.A potential reason for RFdiffusion's overall lower diversity is due to its lack of secondary structure diversity -favoring to sample mostly helical structures.App.G provides additional analysis into the FrameFlow motif-scaffolding results.We conclude FrameFlow-amortization achieves much more structural diversity than RFdiffusion.

Discussion
In this work, we present two methods building on FrameFlow for tackling motif-scaffolding.These methods can be used with any flow-based model.First, with motif-amortization we adapt the training of FrameFlow to additionally be conditioned on the motif -in effect turning FrameFlow into a conditional generative model.Second, with motif guidance, we use an unconditionally trained FrameFlow for the task of motifscaffolding though without any additional task-specific training.We empirically evaluated both approaches, FrameFlow-amortization and FrameFlow-guidance, on the motif-scaffolding benchmark from RFdiffusion where we find both methods achieve competitive results with state-of-the-art methods.Moreover, they are able to sample more unique scaffolds and achieve higher diversity.It is important to note amortization and guidance are complementary techniques.Amortization outperforms guidance but requires conditional training while guidance can use unconditional flow models without further training.Guidance generally performs worse due to approximation error in Eq. ( 12) from using an unconditional model in conditional task.We stress the need to report both success rate and diversity to detect when a model suffers from mode collapse.Lastly, we caveat that all our results and metrics are computational, which may not necessarily transfer to wet-lab success.
Future directions.We have extended FrameFlow for motif-scaffolding; further extensions include binder, enzyme, and symmetric design -all which RFdiffusion can currently achieve.For these capabilities, we require extending FrameFlow to handle multimeric proteins.While motif guidance does not outperform motif amortization, it is possible extending TDS to flow matching could close that gap.Related to guidance, one could explore conditioning mechanisms to control properties of the scaffold such as its secondary structure.We make use of a heuristic for Riemannian reconstruction guidance that may be further improved.Despite our progress, there still remains areas of improvement to achieve success in all 25 motifs in the benchmark.
Symmetries.We perform all modelling within the zero center of mass (CoM) subspace of R N ×3 as in Yim et al. (2023b).This entails simply subtracting the CoM from the prior sample x 0 and all datapoints x 1 .As x t is a linear interpolation between the noise sample and data, x t will have 0 CoM also.This guarantees that the distribution of sampled frames that the model generates is SE(3)-invariant.To see this, note that the prior distribution is SE(3)-invariant and the learned vector field v SE( 3) is equivariant because we use an SE(3)-equivariant architecture.Hence by Köhler et al. (2020), the push-forward of the prior under the flow is invariant.
Auxiliary losses.We use the same auxiliary losses in (Yim et al., 2023a).

SO(3) inference scheduler.
The conditional flow in Eq. ( 16) uses a constant linear interpolation along the geodesic path where the distance of the current point x to the endpoint x 1 is given by a pre-metric d g : M × M → R induced by the Riemannian metric g on the manifold.To see this, we first recall the general form of the conditional vector field with x, x 1 ∈ M is given as follows (Chen & Lipman, 2023), with κ(t) a monotonically decreasing differentiable function satisfying κ(0) = 1 and κ(1) = 0, referred as the interpolation rate7 .Then plugging in a the linear schedule κ(t) = 1 − t, we recover Eq. ( 17) However, we found this interpolation rate to perform poorly for SO(3) for inference time.Instead, we utilize an exponential scheduler κ(t) = e −ct for some constant c.The intuition being that for high c, the rotations accelerate towards the data faster than the translations which evolve according to the linear schedule.The SO(3) conditional flow in Eq. ( 16) and vector field in Eq. ( 17) become the following with the exponential schedule, We find c = 10 or 5 to work well and use c = 10 in our experiments.Interestingly, we found the best performance when κ(t) = 1 − t was used for SO(3) during training while κ(t) = e −ct is used during inference.We found using κ(t) = e −ct during training made training too easy with little learning happening.
The vector field in Eq. ( 28) matches the vector field in FoldFlow when inference annealing is performed (Bose et al., 2023).However, their choice of scaling was attributed to normalizing the predicted vector field rather than the schedule.Indeed they proposed to linearly scale up the learnt vector field via λ(t) = (1 − t)c at sampling time, i.e. to simulate the following ODE: However, as hinted at earlier, this is equivalent to using at sampling time a different vector field ṽ(r t , t)induced by an exponential schedule κ(t) = e −ct -instead of the linear schedule κ(t) = 1 − t (that the neural network rθ 1 was trained with).Indeed we have ṽ

D Motif guidance details
For the sake of completeness, we derive in this section the guidance term in Eq. ( 8) for the flow matching setting.In particular, we want to derive the conditional vector field v(x t , t|y) in terms of the unconditional vector field v(x t , t) and the correction term ∇ log p t (y|x t ).Beware, in the following we adopt the time notation from diffusion models, i.e. t = 0 for denoised data to t = 1 for fully noised data.We therefore need to swap t → 1 − t in the end results to revert to the flow matching notations.
Let's consider the process associated with the following noising stochastic differential equation (SDE) which admits the following time-reversal denoising process Thanks to the Fokker-Planck equation, we know that the the following ordinary differential equation admits the same marginal as the SDE Eq. ( 32): with v(x t , t) being the probability flow vector field.Now, conditioning on some observation y, we have Figure 7: Schematic of computing motif-scaffolding designability."Generative model" is a stand-in for the method used to generate scaffolds conditioned on the motif.From there, we use ProteinMPNN (Dauparas et al., 2022) to design the sequence then use AlphaFold2 (AF2) (Jumper et al., 2021) in predicting the structure of the sequence.The RMSD is calculated on the scaffold (blue) and motif (red) separately with alignments.A generated scaffold passes designability if the scaffold RMSD < 2.0 and motif RMSD < 1.0.

E Designability
We provide details of the designability metric for motif-scaffolding and unconditional backbone generation previously used in prior works (Watson et al., 2023;Wu et al., 2023).The quality of a backbone structure is nuanced and difficult to find a single metric for.One approach that has been proven reliable in protein design is using a highly accurate protein structure prediction network to recapitulate the structure after the sequence is inferred.Prior works (Bennett et al., 2023;Wang et al., 2021) found the best method for filtering backbones to use in wet-lab experiments was the combination of ProteinMPNN (Dauparas et al., 2022) to generate the sequences and AlphaFold2 (AF2) (Jumper et al., 2021) to recapitulate the structure.We choose to use the same procedure in determining the computational success of our backbone samples which we describe next.As always, we caveat these results are computational and may not transfer to wet-lab validation.While ESMFold (Jumper et al., 2021) may be used in place of AF2, we choose to follow the setting of RFdiffusion as close as possible.
We refer to sampled backbones as backbones generated from our generative model.Following RFdiffusion, we use ProteinMPNN at temperature 0.1 to generate 8 sequences for each backbone in motif-scaffolding and unconditional backbone generation.In motif-scaffolding, the motif amino acids are kept fixed -Protein-MPNN only generates amino acids for the scaffold.The predicted backbone of each sequence is obtained with the fourth model in the five model ensemble used in AF2 with 0 recycling, no relaxation, and no multiple sequence alignment (MSA) -as in the MSA is only populated with the query sequence.Fig. 7 provides a schematic of how we compute designability for motif-scaffolding.A sampled backbone is successful or referred to as designable based on the following criterion depending on the task: • Unconditional backbone generation: successful if the Root Mean Squared Deviation (RMSD) of all the backbone atoms is < 2.0Åafter global alignment of the Carbon alpha positions.
• Motif-scaffolding: successful if the RMSD of motif atoms is < 1Åafter alignment on the motif Carbon alpha positions.Additionally, the RMSD of the scaffold atoms must be < 2Åafter alignment on the scaffold Carbon alpha positions.

F FrameFlow unconditional results
We present backbone generation results of the unconditional FrameFlow model used in FrameFlow-guidance.We do not perform an in-depth analysis since this task is not the focus of our work.Characterizing the backbone generation performance ensures we are using a reliable unconditional model for motif-scaffolding.We evaluate the unconditionally trained FrameFlow model by sampling 100 samples from lengths 70, 100, 200, and 300 as done in RFdiffusion.The results are shown in Tab. 2. We find that FrameFlow achieves slightly worse designability while achieving improved novelty.We conclude that FrameFlow is able to achieve strong unconditional backbone generation results that are on par with a current state-of-the-art unconditional diffusion model RFdiffusion.We perform secondary structure analysis of the unconditional samples in Fig. 8.We find FrameFlow has a tendency to sample more alpha helical structures than the data distribution but still has roughly the same coverage of structures.Future work could investigate the cause of such helical tendency and improve the secondary structure sample distribution.

G FrameFlow motif-scaffolding analysis
In this section, we provide additional analysis into the motif-scaffolding results in Sec.5.2.Our focus is on analyzing the motif-scaffolding with FrameFlow: motif amortization and guidance.The first analysis is the empirical cumulative distribution functions (ECDF) of the motif and scaffold RMSD shown in Fig. 9.We find that the main advantage of amortization is in having a higher percent of samples passing the motif RMSD threshold compared to the scaffold RMSD.Amortization has better scaffold RMSD but the gap is smaller than motif RMSD.The ECDF curves are roughly the same for both methods.
Tab. 3 shows the average pairwise TM-score for all designable scaffolds per motif.We report this for each method where we find our FrameFlow approaches get the lowest average pairwise TM-score in 19 out of 24 motifs.The average pairwise TM-score is meant to complement the cluster criterion for diversity in case where clustering leads to pathological behaviors.Both metrics have their strengths and weaknesses but together help provide more details on sample diversity.

Figure 5 :
Figure 5: Secondary structure analysis.2D kernel density plots of secondary structure composition of designable motif-scaffolds from FrameFlow-amortization and RFdiffusion.Here we see RFdiffusion tends to mostly generate helical scaffolds while FrameFlow-amortization gets much more scaffolds with strands.

Figure 8 :
Figure 8: Secondary structure analysis of unconditional samples.Using FrameFlow we sample 100 proteins for each length 70, 100, 200, and 300.The 2D kernel density estimation plot of the secondary structure composition is shown on left.On the right, we show the secondary structure composition of all length 70, 100, 200, and 300 proteins in the training set.We find FrameFlow has a tendency to sample more alpha helical structures than the data distribution.

Figure 10 :
Figure10: Closest motif-scaffolds from FrameFlow-amortization on the three motifs it fails to solve.We find the > 1.0Åmotif RMSD is the reason for the method failing to pass designability.